Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SSE2 fillers #2566

Merged
merged 5 commits into from
Dec 11, 2023
Merged

Conversation

itzpr3d4t0r
Copy link
Member

@itzpr3d4t0r itzpr3d4t0r commented Nov 12, 2023

This PR is a continuation of #2382 and it adds all blend modes for fillers implemented with SSE2. This should perform better on ARM and on old x86 CPUs that don't support the newest AVX2 instructions (PR for this: #2565).

Results:
From my testing SSE2 is about 2X slower than AVX2 (makes sense since we work on 1/2 the pixels at a time) but is still a lot faster than the current single pixel implementation and is about as fast as the AVX2 blit with cached color surface, so I'd say that's a win either way.
ON MAIN

Flag: BLEND_ADD
fill: 1.8567257939238243
blit: 0.022120719999827853
--------------------
Flag: BLEND_SUB
fill: 1.8767017999998643
blit: 0.021530719999827853
--------------------
Flag: BLEND_MULT
fill: 1.7614726199999495
blit: 0.03417129999997996
--------------------
Flag: BLEND_MIN
fill: 1.82431628000013
blit: 0.02125239999986661
--------------------
Flag: BLEND_MAX
fill: 1.872223340000346
blit: 0.021590140000080284
--------------------

WITH THIS PR

Flag: BLEND_ADD
fill: 0.03238646000099834
blit: 0.021517279999534365
--------------------
Flag: BLEND_SUB
fill: 0.022462600001017564
blit: 0.021596119999594522
--------------------
Flag: BLEND_MULT
fill: 0.05370256000023801
blit: 0.034474519999639595
--------------------
Flag: BLEND_MIN
fill: 0.02288747999991756
blit: 0.02128549999979441
--------------------
Flag: BLEND_MAX
fill: 0.022593719998258165
blit: 0.021513199999753853
--------------------

Test Program:

from timeit import repeat

import pygame

pygame.init()

surf = pygame.Surface((500, 500))
surf.fill((132, 33, 200))

color = pygame.Surface((500, 500))
color.fill((24, 24, 24))

flags = [
    "BLEND_ADD",
    "BLEND_SUB",
    "BLEND_MULT",
    "BLEND_MIN",
    "BLEND_MAX",
]

G = globals()

for flag in flags:
    print(f"Flag: {flag}")
    teststr = "surf.fill((24, 24, 24), None, pygame." + flag + ")"
    l = [min(repeat(teststr, globals=G, number=1000, repeat=10)) for _ in range(5)]
    print(f"fill: {sum(l) / len(l)}")

    teststr = "surf.blit(color, (0, 0), None, pygame." + flag + ")"
    l = [min(repeat(teststr, globals=G, number=1000, repeat=10)) for _ in range(5)]
    print(f"blit: {sum(l) / len(l)}")
    print("-" * 20)

@itzpr3d4t0r itzpr3d4t0r added Performance Related to the speed or resource usage of the project SIMD Surface pygame.Surface labels Nov 12, 2023
@itzpr3d4t0r itzpr3d4t0r requested a review from a team as a code owner November 12, 2023 13:27
@itzpr3d4t0r itzpr3d4t0r marked this pull request as draft November 12, 2023 13:35
@itzpr3d4t0r itzpr3d4t0r marked this pull request as ready for review November 12, 2023 14:48
Copy link
Member

@MyreMylar MyreMylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

I can follow all the SIMD logic and I see the same speedup locally with this PR over main as described in the PR comments. Makes a big difference! Only noticed one small indentation thing that seemed off. I guess the auto-formatter didn't catch it because it was inside a pre-processor macro.

@MyreMylar MyreMylar added this to the 2.4.0 milestone Nov 26, 2023
Copy link
Member

@Starbuck5 Starbuck5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the comparisons vs a single color surface in the PR description. Although I think it's using SDL blit, not AVX blit, since the surfaces you make in the test script don't have alpha.

I was just talking to somebody about how to implement a "fade to white" effect, and I think with this merged the MULT fillers would officially be a more efficient way to do that then a white surface and set_alpha.

Anyways, the code looks good to me.

@Starbuck5 Starbuck5 merged commit 6111af4 into pygame-community:main Dec 11, 2023
29 checks passed
@itzpr3d4t0r itzpr3d4t0r deleted the add-sse2-fillers branch December 14, 2023 10:15
@itzpr3d4t0r itzpr3d4t0r mentioned this pull request Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Related to the speed or resource usage of the project SIMD Surface pygame.Surface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants